Method and apparatus for producing a benchmark application for performance testing

ABSTRACT

A method of producing a benchmark application for testing input/output—I/O—settings of a computer application, the method comprising: compiling trace data relating to operations to be executed by the computer application; grouping the trace data into one or more phases, based on different stages in the execution of the computer application to which the operations relate; identifying patterns in the trace data and comparing the patterns; producing simplified trace data in which trace data having similar patterns are combined; and outputting a benchmark application which includes the simplified trace data and information indicating where the trace data have been combined.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of European Application No.14193482.8, filed Nov. 17, 2014, in the European Intellectual PropertyOffice, the disclosure of which is incorporated herein by reference.

BACKGROUND

1. Field

The present invention relates to testing input/output (I/O) settings ina computer-executable application and in particular to a method andapparatus for producing a benchmark application relating to thecomputer-executable application for testing input/output—I/O—settings ofthe computer-executable application.

2. Description of the Related Art

There is an ever increasing amount of parallelism in the computationalcomponents in successive generations of High Performance Computing (HPC)architectures. I/O hierarchies are becoming more and more complex inorder to meet the storage requirements of applications (programs)running on such systems.

Writing applications to efficiently utilize the storage hierarchybenefits from an in-depth knowledge of each layer of the I/O stack andan understanding of how the choice of different I/O programming styles,Application Programming Interface (API) libraries, and file systemsettings impact application performance.

Two approaches exist for understanding and testing the performanceimpact of these choices.

The first approach is to run the application with a parameter sweep ofthe various I/O settings. This gives an accurate picture of how thedifferent settings affect performance. However, it is an impractical anda time-consuming approach as the full application needs to be run foreach test.

The second approach is to run a lightweight I/O benchmark suite. Thesehave been developed to run small timing tests that are representative ofthe I/O operations performed by the application. The problem with suchbenchmark suites is that they do not match exactly to the operationsperformed by the application and are generally a very simplifiedrepresentation of the application, in terms of the level of detailavailable for testing for each operation. Also, they do not alwayscapture the full set of operations that are involved, for example anycommunications necessary for gathering data before performing a writeoperation are often not captured.

I/O characteristics are becoming an increasingly significant bottleneckin the performance of parallel processing in parallel systems andparallel scientific applications. It is therefore important to ensurethat an application uses efficient I/O settings to support scalable datamovement between disks and distributed memories.

This is a difficult task as scientific applications, for example, have awide range of I/O requirements depending on their access patterns, thetypes and sizes of files that are read and written, and the transactionsize (the amount of data transferred each time by a process).

A survey of the I/O requirements of High Performance Computing (HPC)applications revealed the following:

Applications are mostly dominated by sequential reads and writes; randomaccess of file data is rare.

-   -   Append-only write operations are the main type of I/O operation.    -   The I/O transaction sizes (the amount of data to be transferred        each time by a process) vary widely—generally ranging from        several kilobytes to ×10's of Megabytes.    -   Many applications have adopted one-file-per-process for I/O        rather than parallel I/O.    -   Most applications use their own file-formats rather than        portable, self-describing formats, such as HDF5 or NetCDF.

In addition to the I/O programming choices in the applications, thereare a range of hardware and file systems options available which canhave an impact on the overall I/O performance of an application.

It is therefore desirable to improve the speed of testing while reducingthe processing burden and inaccuracies when carrying out such tests.

SUMMARY

Additional aspects and/or advantages will be set forth in part in thedescription which follows and, in part, will be apparent from thedescription, or may be learned by practice of the invention.

In an embodiment of a first aspect of the present invention there isprovided a method of producing a benchmark application for testinginput/output (I/O) settings of a computer application. The methodcomprises the steps of: compiling trace data relating to operations tobe executed by the computer application; grouping the trace data intoone or more phases, based on different stages in the execution of thecomputer application to which the operations relate; identifyingpatterns in the trace data and comparing the patterns; producingsimplified trace data in which trace data having similar patterns arecombined; and outputting a benchmark application which includes thesimplified trace data and information indicating where the trace datahave been combined.

Compiling trace data involves copying (tracing) data of operations(processes) involved in the execution of the application. The copieddata may be called a “trace” application. One advantage of a trace beingtaken is so that the data (trace data) may be manipulated and testedwithout affecting the original data, so that the results of any testingmay be applied subsequently to the original data once the desiredsettings, resulting in the desired effect, have been determined.

Preferably, trace data may be compiled relating to I/O operations,performed by the application, only. Such I/O operations may includereading and writing instructions to a disk as well as operationsassociated with the I/O operations such as inter-process communicationto gather data for the writing process. Tracing only the I/O operationsin this way has an advantage that less time is required to gather theinitial trace information, so less storage is required to hold the tracedata. If trace data is compiled relating to all operations of theapplication, a step of filtering out (identifying) the I/O operationsmay be implemented before the phase analysis stage.

The trace data are then divided into groups (corresponding to phases)based on different stages in the execution of the computer applicationto which the data relate. That is to say, some of the data will relateto operations involved in a specific phase, e.g. an initialization phasein which the operations, and thus corresponding data, relate to theinitialization of the computer application. Typical operationsassociated with an initialization phase may include read operations.Further phases into which the data may be grouped include for example atime-stepping phase and a finalization phase. These are explained inmore detail below. Computer applications may for example have only asingle phase, in the sense that the operations involved in executing thecomputer application are all closely related and therefore are allgrouped together. For example, the single phase may be the I/O phase ofan application. In this case the I/O operations, for the same file (orfiles) and that occur during a defined execution window, from the I/Otrace, can be grouped together. In another example, multiple I/O phasesmay exist for specific events such as files read for initialization,writing of status files during an iteration, and a final writing of dataduring the close-down stage.

Once the one or more phases are established, the data are then scannedin order to identify patterns in the data. Patterns may include similaroperations which are repeated, multiple read and/or write operations toa single file, similar amounts of data involved in a read and/or writeoperation, etc. Any patterns identified in different phases are thencompared. In the event of only a single phase, this comparison betweenphases will not take place, but instead a comparison of patternsidentified will still be carried out and I/O benchmarks may be createdand run to find appropriate settings for that phase.

The phases may be tested individually so as to avoid unnecessaryprocessing. When similar patterns exist in different phases, the testresults relating to patterns existing in a phase which has been testedmay simply be applied to the similar patterns in subsequent phases, thussaving on unnecessary processing. Patterns within a single phase mayalso be compared, particularly in the event of only a single phase, asmentioned above and described in more detail below. The comparisonbetween trace data may take into account the various fields in the data.The number of fields that are the same between the compared data maydetermine the level of similarity between the data. For example, if theprocess I/O operation (i.e. read/write operation), file size andprocesses involved in two phases are the same, then these I/O phases canbe classed as being the same. In this case the benchmark suite(described further below) would be run for one phase but the resultswould apply to the two phases. Further to this, if the file name(s) usedfor the two phases are also the same and both occur within a relativelyshort interval then this might point to an inefficient I/O pattern. Inthis case a new benchmark suite may be generated where the I/O operationis performed for twice the file size.

The system may keep track of which I/O phase each benchmark isassociated with. This will include the case where one benchmarkrepresents two or more phases. In an implementation this may beimplemented by a simple lookup array of benchmarks to I/O phases. Thisprovides a simple solution when providing recommendations to developersfor optimal I/O patterns rather than directly feeding the best settingsback into the original application.

Once the similar patterns have been identified and compared with respectto the or each phase, the trace taken of the computer application isthen simplified to exclude any repeated patterns, and in particularmultiple instances of similar patterns identified between phases may beremoved.

Once the trace has been simplified, a benchmark application is producedwhich includes the simplified trace data along with informationindicating how the trace data has been simplified relative to theoriginal trace.

A benchmark application or benchmark suite is a simplifiedrepresentation of an application (computer application) to be tested,which may be used for such testing. Input/output parameters of abenchmark application are intended to respond in a similar manner tothose of the application which the benchmark application represents. Inthis way, when input/output parameters are changed to determine theeffect of such changes on the performance of an application, the changesobserved in the benchmark application closely mirror those of theapplication. This allows factors such as efficiency, throughput and dataflow to be tested on the benchmark application, with the results ofthose tests having direct implications for the corresponding factors ofthe application.

In essence, in an embodiment, the invention works by performing a traceof an application to capture the I/O and associated operations. Thetrace data is then processed to identify the I/O phases of theapplication. An analysis of the I/O phases allows inefficient I/Opatterns to be spotted and to identify I/O phases with similaroperations.

Optionally then, with this trace and I/O phase information, benchmarktests are created which test the various I/O settings at each level ofthe I/O hierarchy. The output of the benchmarks is used to recommend theoptimal I/O settings for the application or potential optimizations tothe source code to improve performance.

In this way, an embodiment of the invention can provide lightweight andquick I/O performance tests that closely match of the I/O operationsperformed by a computer application.

Optionally, an embodiment further includes testing the benchmarkapplication to establish the performance of benchmark operations of thebenchmark application.

Once the benchmark application is created, various tests, in particulartests relating to I/O settings in the benchmark application, may becarried out to establish which changes to I/O settings give improvedperformance. By carrying out such tests on the benchmark application,and not the original computer application, the time taken to carry outeach individual test may be reduced and the processing overall may besimplified due to the simplified nature of the benchmark applicationrelative to the original application.

Optionally, an embodiment further includes testing I/O settings of thebenchmark application to identify inefficiencies in data flow throughthe system, and/or characteristics of a system, on which the computerapplication is executed.

The tests carried out may involve changing I/O settings in order to findappropriate settings to improve data flow during execution of thebenchmark application, which represents the data flow during executionof the computer application. The characteristics of the system mayinclude data buffer capacity, number of parallel processes being carriedout simultaneously, maximum processing speed, accepted data format, etc.This allows I/O settings to be improved in relation to the system andalso allows system characteristics which are inefficient to behighlighted.

Optionally, an embodiment further comprises outputting a reportindicating performance of the benchmark operations and/or comprisingrecommendations for I/O settings, specific to each system on which thecomputer application is operable to be executed, for use in improvingdata flow.

Once the preferred I/O settings have been tested and established, areport is produced including information indicating what and how testswere carried out. This report can be used by expert technicians to adaptsimilar applications or similar systems to improve throughput and dataflow. By reviewing performance figures for various I/O settings, desiredeffects may be weighed up against potential drawbacks in order toachieve a system and application which is carried out optimally, in thecontext of the performance desired by a programmer. The report may alsoinclude recommendations for improving performance, based on historicinformation collected in relation to the system.

Optionally, in an embodiment, the report comprises recommendationsregarding at least one of access patterns, types of files to be read andwritten, sizes of files to be read and written, transaction sizes, andtimings.

If a specific setting or factor is indicated to have a significanteffect on performance, this may be highlighted in the report. Similarly,if a combination of factors has a desired effect or is shown to improveperformance, this may be highlighted in the report and recommendationson how to maintain this improved performance may be indicated. RegardingI/O data flow, factors such as file size, transaction size (total filetransfer size), read and write timings, etc. may have a large influenceon performance. Managing these factors appropriately, such that parallelprocesses are carried out within a desired time frame, may assist inreducing bottlenecking of data flow during processing.

Optionally, in an embodiment, the compiling of the trace data furthercomprises extracting a chronology of operations that occur duringexecution of the computer application.

By determining a chronology (the temporal sequence) of operations, theoperations may be reorganized so that data flow is improved. Forexample, if an operation, in a sequence of operations, is identifiedrequiring a large amount of data to be read/written from/to a file,execution of that operation may be positioned in the sequence such thatother required or important operations are placed ahead in the sequenceand operations of lesser importance are positioned behind in thesequence. Important operations may include, amongst others, operationson which other operations are dependent, i.e. that must be completedbefore the dependent operations may be executed. Knowing the chronologyof operations in parallel processing systems allows for each parallelsequence to be arranged such that there is always the possibility toplace an important operation ‘ahead of the queue’, to be processed next,which would not be possible for example if all parallel processors areoccupied with processing large operations simultaneously.

Optionally, in an embodiment, the trace data comprises informationindicating a type and/or size of the data, relating to the operations,being read or written.

Having knowledge of the type and/or size of the data allows for a morecomprehensive test of the I/O settings to be carried out such that amore informative report can be produced. Information indicating datatype and/or size may further assist in identifying patterns in the tracedata.

Optionally, in an embodiment, the testing of the benchmark applicationincludes studying the effect on performance of changing I/O settings.

When testing performance, random or predetermined changes to the I/Osettings may be carried out in order to determine the correspondingeffect on performance. The performance of the benchmark application isconsidered to directly correlate to the performance of the originalcomputer application.

Optionally, in an embodiment, where trace data having similar patternshave been combined, the testing of the benchmark application involvesonly testing a first instance of the trace data having a similar patternand the result of testing the first instance of the trace data having asimilar pattern is then applied to subsequent instances.

By avoiding repetition of similar tests on similar trace data patterns,the time taken to complete testing can be reduce without compromisingthe accuracy of the test results.

Optionally, in an embodiment, the trace data are grouped into at leastan initialization phase, relating to reading input files, and afinalization phase, relating to writing re-start files, which allowoperations to re-start following completion of testing.

An initialization phase and a finalization phase are typical of mostcomputer applications. Many other phases may exist in addition to thesetwo phases. Each phase is intended to group operations related tosimilar functions in the context of executing the computer application.

Optionally, in an embodiment, comparing patterns between phasescomprises identifying phases including similar sets of operations whicheach involve accessing the same file.

One example of patterns which are likely to give similar results duringtesting is when similar operations or set of operations involveaccessing the same file. These may therefore be combined in thesimplified trace data such that when the benchmark application,including the simplified trace data, is tested, multiple instances ofsuch patterns are taken to give the same testing results. This meansthat any testing is only carried out on the first such instance and, inthe place of further testing to subsequent instances, the results oftesting on the first instance are then applied to subsequent instancesof such patterns. Thus saving time and reducing the processing burden.

Optionally, in an embodiment, the testing is carried out on thebenchmark application in a benchmark suite and comprises inputting thebenchmark application into the benchmark suite along with auser-modifiable input file, which sets control parameters of thebenchmark suite.

In this embodiment, two files may be fed into a benchmark suite, whichcontrol the operation of the benchmark suite.

The first file is the benchmark application with the simplified tracedata and information indicating where trace data have been combined,wherein the simplified trace data includes details of the phases. Thishelps to define the type of tests to be run by the benchmark suite. Thesecond file is a general input file which is able to set morefine-grained control parameters for the tests.

Optionally, in an embodiment, the user-modifiable input file designatesthe phases to be tested and/or specifies a target execution time foreach benchmark suite, and/or identifies trace data relating to anyoperations to be excluded from the testing.

This general input file can be generated automatically and can bemodified by the user. The input file can be used:

-   -   to enable or disable the testing of different phases in the        benchmark application;    -   to select which elements of the benchmark application are        eligible for a performance test;    -   to specify additional settings such as:        -   any additional I/O specific tests to run;        -   a target execution time (completion time) for each benchmark            suite (test procedure).

Optionally, in an embodiment, producing the simplified trace datainvolves maintaining a first instance of a specific pattern in thesimplified trace data and replacing subsequent instances, having similarpatterns to the first instance, with an indication that the subsequentinstance has been combined with the first instance.

When combining trace data having similar patterns, an indication may beprovided of how the trace data are combined. Such indications are nottested and may indicate the result of testing the first instance of thetrace data pattern, which is to be taken as the result for the tracedata which has been replaced with the indication.

In an embodiment of a second aspect of the present invention, there isprovided an apparatus for producing a benchmark application for testinginput/output—I/O—settings of a computer application, the apparatuscomprising: means for compiling trace data relating to operations to beexecuted by the computer application; means for grouping the trace datainto one or more phases, based on different stages in the execution ofthe computer application to which the operations relate; means foridentifying patterns in the trace data and comparing the patterns; meansfor producing simplified trace data in which trace data having similarpatterns are combined; and means for outputting a benchmark applicationwhich includes the simplified trace data and information indicatingwhere the trace data have been combined.

In any of the above aspects, the various features may be implemented inhardware, or as software modules running on one or more processors.Features of one aspect may be applied to any of the other aspects. Theinvention also provides a computer program or a computer program productfor carrying out any of the methods described herein, and acomputer-readable medium having stored thereon a program for carryingout any of the methods described herein. A computer program embodyingthe invention may be stored on a computer-readable medium, or it could,for example be in the form of a signal such as a downloadable datasignal provided from an Internet website, or it could be in any otherform.

BRIEF DESCRIPTION OF THE DRAWINGS

These and/or other aspects and advantages will become apparent and morereadily appreciated from the following description of the embodiments,taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart showing the method of producing a benchmarkapplication according to an embodiment;

FIGS. 2A and 2B are block diagrams showing respectively a processor andprocessing apparatus according to an embodiment;

FIG. 3 is a block diagram showing an exemplary I/O systeminfrastructure;

FIG. 4 is a flowchart showing in detail the steps of testing a benchmarkapplication;

FIG. 5 is a source code snippet showing an exemplary initializationphase;

FIG. 6 is a code sample relating to a parallel application using serialI/O by process 0 to perform I/O operations;

FIG. 7 is a code sample showing the I/O trace file, formatted as an XMLfile, generated from running the sample application of FIG. 6 with fourMPI processes;

FIG. 8 is an example of similar patterns being combined according to anembodiment;

FIG. 9 is an example of an inefficient pattern in source code;

FIG. 10 is the trace file generated for the sample code of FIG. 9;

FIG. 11 is an example input file to specify control parameters for thebenchmark suite, according to an embodiment;

FIG. 12 is a flowchart showing steps involved in carrying out thebenchmark suite;

FIG. 13 is an example of pseudo-code for the initialization phase of thesample code shown in FIG. 6;

FIG. 14 is a flowchart showing the process and the types of informationsaved after

FIG. 15 is a flowchart showing the result analysis and report generationaccording to an embodiment.

DETAILED DESCRIPTION

Reference will now be made in detail to the embodiments, examples ofwhich are illustrated in the accompanying drawings, wherein likereference numerals refer to the like elements throughout. Theembodiments are described below to explain the present invention byreferring to the figures.

Testing of a computer application, such as testing input/output (I/O)settings of an application in a high performance computing architectureinvolving parallel processing, may be carried out by the followingmethod.

FIG. 1 shows a simplified flowchart depicting the main steps involved increating the benchmark application, which is closely related to acorresponding computer application, for the purpose of carrying outtesting of various features of the benchmark application includinginput/output (I/O) settings.

The method involves identifying a computer application which is to betested. Such a computer application will typically include manyoperations to be executed, some of which may be very different andothers may vary only in minor details. Once a computer application isidentified, a trace of the computer application is taken (S101). Takinga trace involves compiling trace data relating to the operations of thecomputer application. Trace data is in essence a copy of the originaldata relating to the operations. Trace data may however includeinformation indicating the type, size, format, etc. of the original datawhich may substitute, or be provided in addition to, the copy of theoriginal data. If desired, only data relating to I/O settings andprocesses may be copied to produce the trace data.

Once the trace data has been compiled it is then grouped into phases(S102). Each phase relates to the stages involved in the execution ofthe computer application. Phases may be set based on predeterminedassociation between operations in the computer application, or may beset based on user input. Phases may further be divided into sub-phasesin order to aid processing if for example a phase is of a particularlylarge size.

The trace data is then scanned to identify patterns in the trace data(S103). Patterns may for example be similar sequences of operations,operations involving accessing the same file or operations involving thetransfer of a similar amount of data. Operations in this context includesingle operations and/or sequences of operations.

Once patterns in the trace data have been identified, the patterns arecompared in order to identify similarities, and similar overallpatterns. Such patterns are compared between phases to identify anyoperations, which are carried out in one phase, which are then repeatedin a subsequent phase. In the case of the presence of only a singlephase, a comparison is made between sub-phases within that phase.

Based on the comparisons made, duplicates, or effectively multipleinstances, of similar patterns are identified. The trace data is thesimplified by combining the identified similar patterns to producesimplified trace data (S104). Simplified trace data includes less datato be tested when compared with the trace data. The simplified tracehowever still represents the trace data in that the simplified tracedata includes information indicating how and where patterns have beencombined. But testing carried out on the simplified trace data includestesting the combined patterns. This means that re-running the tests ateach instance of the similar patterns (as would be the case for thetrace data) is not necessary.

Once the simplified trace data is produced, the simplified trace dataand the related information are output in the form of a benchmarkapplication (S105). A benchmark application is an application, orrepresentative application, on which testing may be carried out.

FIG. 2A shows an exemplary embodiment of the apparatus (processor)according to the invention on which the method shown in FIG. 1 may becarried out. In this embodiment, the processor 100 comprises a tracedata compiler 101, a trace data grouper 102, a pattern identifier 103, acomparer 104 and a simplified trace data producer 105. The depictedarrows show the flow of data. FIG. 2B shows an alternative embodiment ofthe apparatus (processing apparatus 200) according to the invention onwhich the method shown in FIG. 1 may be carried out. In this embodiment,the processing apparatus 200 comprises a trace data compiling unit(means for compiling trace data) 201, a trace data grouping unit (meansfor grouping trace data) 202, a pattern identifying unit (means foridentifying patterns) 203, a simplified trace data producing unit (meansfor producing simplified trace data) 204 and a benchmark applicationoutput unit (means for outputting a benchmark application) 205. Thedepicted arrows show the flow of data.

An example of the architecture of a system on which an application,according to the present invention, is to be executed is shown in FIG.3. FIG. 3 also shows some of the I/O design choices available for anapplication at three programming layers; the application programminglayer, the I/O programming interface layer, and the file system layer.

Application Layer—At this layer one of three exemplary I/O programmingstyles may be used—serial I/O, where all data is gathered to a rootprocess which then writes the data; parallel I/O, whereby each processconcurrently writes its portion of data to a disk (this data can bewritten either to a single shared file or to a unique file for eachprocess); and I/O servers, where one or more processes are assigned toexclusively handle I/O operations allowing write operations to beperformed asynchronously by these processes while other processescontinue with the calculation. The design decisions at the applicationprogramming layer also determine the file access pattern and transactionsizes (amount of data transferred to and from disk by each process)—forexample write buffers may be used to ensure large, sequential writeoperations are performed.

I/O Interface Layer—At the I/O programming interface layer there are anumber of application programming interfaces (API) available includingthe POSIX I/O interface for binary file I/O; the MPI (Message PassingInterface)—I/O interface which is built on top of the POSIX interface togive a parallel I/O programming style; and portable APIs such as netCDFand HDF5 which provide self-describing, machine-independent dataformats.

File System Layer—The available hardware and system configurationdetermine the available options at the file system layer. Directattached file systems, with disk connected directly to the computenodes, give a non-scalable and non-parallel setup. Network file systems(e.g. NFS) are a more scalable solution but, as there is only one filesystem server there is a bottleneck at this server. Clustered filesystems (e.g. GPFS), similar to network file systems but with severalservers instead of just one, allow different clients to write todifferent servers. However there is a bottleneck at the server when aclient writes one large file as this cannot be split between servers.Distributed file systems (e.g. Lustre, FEFS) are fully parallel filesystems allowing scaling across multiple servers for both individualfiles and multiple clients.

As shown above there are many choices to consider when designing andimplementing an application's I/O procedures. It is difficult to knowthe optimal settings without knowing the programming flow of theapplication and the I/O characteristics of the system on which theapplication is running. Moreover, it is impractical to run fully-fledgedapplications for testing and evaluation of I/O settings and solutions.

The present invention, in an embodiment, addresses these issues byextracting and identifying the details of the main I/O operations froman application. This information is analyzed to identify main I/O phasesin the application (e.g. an initialization phase which comprises readingof input file plus and broadcast of the data to all computationprocesses) and to spot any inefficient I/O patterns (e.g. many smallreads/write operations to the same file instead of a single largerread/write). The I/O phase information is then parameterized and fedinto a benchmark suite which tests a range of I/O options at each of thethree I/O layers shown in FIG. 3. These features allow the optimal I/Osettings for the application running on a given system to be found in athorough and efficient manner.

The present invention, according to an embodiment, helps to identify theoptimal I/O settings for an application running on a given HPC system bysimulating the I/O pattern of the application with different settingsusing the I/O benchmark suite.

FIG. 4 shows a more detailed flowchart of the method according to anembodiment of the invention. This flowchart shows, in particular, stepsinvolved in the testing of a benchmark application.

In the embodiment shown in FIG. 4, a trace is taken of the I/Ooperations (i.e. recording a chronology or when and where I/O eventsoccurred) performed by an application during execution. The trace datais then processed to identify the I/O phases of the application. Thedifferent I/O phases are then analyzed and compared (see “I/O phases”section below for further detail).

This allows any inefficient I/O patterns in the application to bespotted, and different I/O phases with similar operations to beidentified, so as to reduce the number of tests that are performed inthe subsequent benchmarking stage. The trace and I/O phase data is thenfed into a benchmark suite which emulates the I/O elements (e.g. diskaccess pattern, file type, file size, transaction size) of theapplication. Using this information, the benchmark suite studies theperformance effect of changing settings, for example across each of thethree I/O layers as shown in FIG. 3.

Upon completion of the benchmark suite for each phase, the optimal I/Osettings for the application may be identified. Recommendations on thesettings for application execution or for potential code optimizationscan then be reported.

In the context of FIG. 4, an I/O phase may group the operations requiredto access a single file or a group of related files at a particularstage of an applications execution. As well as system I/O operations(e.g. read( ), write( )), an I/O phase may also include any associatedoperations such as communication required in gathering or scattering thedata.

For scientific applications, I/O phases may include: an initializationphase where an input file is read; a time-stepping phases where themeasurements or statistics for the scientific system are periodicallywritten to file; and a finalization/close-down stage (phase) where finalmeasurement files may be written for post-processing and where restartfiles are written to allow a simulation to continue from where it leftoff.

FIG. 5 shows a sample source code snippet for reading an input file anddistributing data to each process. The phase includes scattering of datafrom process 0 to all other processes, as well as opening and readingthe file. All lines of this sample code, shown in FIG. 5, would beclassed as a single I/O phase (e.g. the initialization phase). Theoperations that comprise this phase are the open( ) and read( )operation performed by process 0 and the MPI_(1')SCATTER( ) operationcalled by all processes.

A more detailed description of some of the method steps according to anembodiment of the invention are described below.

Trace application—An initial step involves an I/O tracing utility. Thisutility may work on a running application. It extracts and saves thechronology of I/O operations that occur during execution. Theinformation captured may include the type of data (the API used); thesize of the data read or written; which processors took part in theoperation. Associated operations such as any communications required ingathering or scattering the data are also able to be captured.

I/O phase discovery—The output of the tracing utility is analyzed so asto characterize and split I/O operations into phases (e.g.initialization phase, time-stepping phase, and finalization/close-downstage) depending on the application execution. The I/O phase informationis then written to an I/O trace report file. An example parallelapplication which uses serial I/O operations is shown in FIG. 6. Thisexample program has two I/O phases—the initialization stage which readsa restart file to setup the parameters for the calculation and the I/Operformed calculation stage which writes out restart files are regulartime-step intervals.

FIG. 7 shows an example of an I/O trace report file generated after theI/O phase discovery stage for the sample application when run with 4processes with an input file of 256 MB. As an example, the trace file isformatted as an XML file. Other output file formats are alsopossible—the main requirement for the trace file is that it is capableof being read by I/O benchmark suite. There is an input in the tracereport file for each I/O phase identified and the phase includes thegathering and scattering of data where appropriate.

To implement this feature, tracking of the operations before and afterthe I/O instructions that access files within the trace utility, isdesirable, so that any associated operations (e.g. communication) arefound. One possible way to do this is to enable trace capturing based ontracking the memory reference of a buffer involved in the I/Oinstruction. In this way any communications involving the buffer arecaptured and added to the trace output. To aid more accurateidentification of I/O phases a time window could be places before andafter the I/O instructions.

I/O phase analysis—Once the I/O phases have been identified from theapplication trace and the details written to the I/O trace file, ananalysis of the trace file then takes place. During the analysis, acomparison is made between the different I/O phases in the file to findphases that are comprised of similar operations.

This comparison has two purposes. Firstly, to reduce the number ofbenchmarks that are run by finding I/O phases that use the same set ofoperations, involve the same or similar processes and read/write similaramounts of data. This helps to reduce the turnaround time of thebenchmark suite. FIG. 8 shows how the I/O phase analysis works on theI/O trace file generated, for the code shown in FIG. 6, by combining thesimilar restart_dump_1 and restart_dump_2 phases into a singlerestart_dump phase. Secondly, to identify inefficient I/O patterns inthe application. This is done by finding successive I/O phases involvingthe same set of operations and the same set of processes that access thesame file. From this information a new I/O phase is created whichcombines these phases into a single phase with larger I/O operations.This new phase is added to the I/O trace file and will be benchmarked tocompare the performance effect. A sample source code snippet andcorresponding I/O phase trace file for an example of an inefficient I/Opattern are shown in FIG. 9 and FIG. 10, respectively. Data from arestart file is read in small chunks by one process and then sent to theone of the processes. The right-hand text box in FIG. 10 shows anadditional I/O phase generated during the I/O Phase Analysis stage afteridentifying the potentially inefficient I/O pattern. This new phase hasgrouped operations so that data is read and sent to remote processes inlarger chunks.

Benchmark file and input file settings—Two files are fed into thebenchmark suite to control its operation. The first file is thebenchmark trace file (benchmark application) with details or the I/Ophases for the application, as well as any new or modified phasesgenerated during the I/O Phase Analysis stage, as described above. Thisdefines the type of benchmarks to be run by the benchmark suite. Thesecond file is a general input file which sets more fine-grained controlparameters for the benchmarks.

FIG. 11 shows a sample input file for the example program and benchmarktrace file described earlier. The key points for this file are:

-   -   the location of the benchmark trace file is specified under the        benchmark_file parameter;    -   the enable_phase and disable_phase parameters together requests        that only tests related to the initialization and restart_dump_1        I/O phases will be run;    -   the application_layer specifies that parameter sweep tests at        this layer should include tests for the serial I/O, parallel I/O        and I/O server programming model as well as testing for varying        values of transaction size and I/O buffer sizes;    -   the api_layer parameter specifies that just the POSIX API tests        should be run at this layer;    -   the fileSystem_layer parameter lists the different file systems        together and their path where the I/O tests will be run—here the        tests are run on an NFS and a Lustre file system.

Benchmark Suite—FIG. 12 shows an exemplary flowchart for the I/Obenchmark suite component according to an embodiment. As shown, firstthe input file and benchmark trace file are read to set the benchmarksettings and control parameters. This information is used to initializethe benchmark runtime environment; this includes the setup of theparallel environment (e.g. MPI processes, threading environment) andcreating directories or files in each file system to be tested. There isthen a loop over each active I/O phase in the benchmark trace file(active means that the I/O phase is listed under the enable_phaseparameter in the input file). For each phase the benchmark is first runwith same settings as in the trace file in order to get a baseline timefor the operation and to compare with the timing information from thetrace file. In the next steps of the flowchart a benchmark test isconstructed and run for each combination of settings across the threelayers of the I/O hierarchy. The times of each test are saved for acomparison and analysis with the baseline time. The benchmarkkernels/tests fit into a benchmarking framework which ensures thatreproducible and statistically accurate performance results aregenerated for each test.

Before running each benchmark kernel, the framework sets a minimumtarget time for which the kernel should run. A number of testrepetitions are then performed. If the execution time of theserepetitions is less than the target time the result is discarded and thetest re-run with twice the number of repetitions. The framework is alsoresponsible for setting up the environment before the test kernel isrun—this includes creating file of the correct format and size for writetests.

FIG. 13 shows an example of the pseudo-code for the initialization phaseof the sample code shown in FIG. 6 above. In this Figure lines 2-5 and9-16 of code in process 0, and lines 2 and 6-8 in process 1 and eachsubsequent process (p), are provided by the benchmark suite frameworkwhile unformatted code needs to be coded for each kernel test. The codefor each test can be constructed either manually by implementing a setof interfaces or automatically by extracting the operations from thebenchmark trace file.

Recommendation engine: benchmark result analysis & reporting—Theinvention further includes, in an embodiment, a recommendation enginewhich identifies the optimal settings and type of system on which to runthe application. The recommendation engine may include of twoparts—benchmark result analysis and recommendation reporting.

Benchmark result analysis—At the end of each benchmark kernel execution,information from the run is extracted and saved to generate a report andrecommendations on the optimal I/O settings. Information saved includesdetails of which I/O phase in the benchmark trace file was tested, theconfiguration settings for the kernel (e.g. the settings at each layerof the I/O hierarchy) and any performance information for the test, e.g.benchmark execution time or I/O bandwidth achieved.

The information can be saved to a text file or a database. This allowsthe benchmark data to be accessed and analyzed between differentinvocations of the benchmark suite. Alternatively, the information canbe kept in working memory, used for this single invocation of thebenchmark suite and then discarded after the reporting stage. FIG. 14shows an example of the flow for the saving of benchmark results aftereach benchmark kernel completes. After the benchmark suite finishes, thesaved results are analyzed to find the best settings and configurationfor each I/O phase. FIG. 15 shows an exemplary flowchart of thisprocess.

Recommendation reporting—Once the optimal settings for each I/O phaseare found a report is generated. This report will be output in astandard format and contains the details about the I/O phase, thebenchmarks run for the I/O phase and details of the settings at eachlayer of the I/O hierarchy to achieve improved performance.

The report can be fed back to developers to suggest possible areas toimprove the I/O performance of their application. The report can alsohelp to identify potentially optimal settings for the applicationrunning on a particular system (such as the best hardware or file-systemto run the application). This information could be added to a jobscheduler profile or HPC workflow and used to allow automatic executionthe application with the most optimal I/O settings for the system.

Although aspects of the invention are discussed separately, it should beunderstood that features and consequences thereof discussed in relationto one aspect are equally applicable to the other aspects. Where amethod feature is discussed, it is taken for granted that the apparatusembodiments include a unit or apparatus configured to perform thatfeature or provide appropriate functionality, and that programs areconfigured to cause a computing apparatus on which they are beingexecuted to perform said method feature.

Although a few embodiments have been shown and described, it would beappreciated by those skilled in the art that changes may be made inthese embodiments without departing from the principles and spirit ofthe invention, the scope of which is defined in the claims and theirequivalents.

What is claimed is:
 1. A method of producing a benchmark application fortesting input and output (I/O) settings of a computer application, themethod comprising: compiling trace data relating to operations to beexecuted by the computer application; grouping the trace data into oneor more phases, based on different stages in an execution of thecomputer application to which the operations relate; identifyingpatterns in the trace data and comparing the patterns; producingsimplified trace data in which trace data having similar patterns arecombined; and outputting a benchmark application which includes thesimplified trace data and information indicating where the trace datahave been combined.
 2. The method of claim 1, further comprising:testing the benchmark application to establish a performance ofbenchmark operations of the benchmark application.
 3. The method ofclaim 2, wherein the testing of the benchmark application furtherincludes testing I/O settings of the benchmark application to identifyone of or both of characteristics of a system and inefficiencies in dataflow in the system, on which the computer application is executed. 4.The method of claim 3, further comprising: outputting a reportindicating one of or both of performance of the benchmark operations andcomprising recommendations for I/O settings, specific to each system onwhich the computer application is operable to be executed, for use inimproving data flow.
 5. The method of claim 4, wherein the reportcomprises recommendations regarding at least one of access patterns,types of files to be read and written, sizes of files to be read andwritten, transaction sizes, and timings.
 6. The method of claim 5,wherein the testing of the benchmark application includes studying aneffect on performance of changing I/O settings.
 7. The method of claim6, wherein, where trace data having similar patterns have been combined,the testing of the benchmark application involves only testing a firstinstance of the trace data having a similar pattern and a result oftesting the first instance of the trace data having a similar pattern isthen applied to subsequent instances.
 8. The method of claim 7, whereinthe testing is carried out on the benchmark application in a benchmarksuite and comprises inputting the benchmark application into thebenchmark suite along with a user-modifiable input file, which setscontrol parameters of the benchmark suite.
 9. The method of claim 8,wherein the user-modifiable input file designates one of or both ofphases to be tested and specifies a target execution time for eachbenchmark suite, and identifies trace data relating to any operations tobe excluded from the testing.
 10. The method of claim 9, wherein thecompiling of the trace data further comprises extracting a chronology ofoperations that occur during execution of the computer application. 11.The method of claim 10, wherein the trace data comprises informationindicating one of or both of a type and size of the data, relating tothe operations, being read or written.
 12. The method of claim 11,wherein the trace data are grouped into at least an initializationphase, relating to reading input files, and a finalization phase,relating to writing re-start files, which allow operations to re-startfollowing completion of testing.
 13. The method of claim 13, whereincomparing patterns between phases comprises identifying phases includingsimilar sets of operations which each involve accessing a same file. 14.The method of claim 14, wherein producing the simplified trace datainvolves maintaining a first instance of a specific pattern in thesimplified trace data and replacing subsequent instances, having similarpatterns to the first instance, with an indication that a subsequentinstance has been combined with the first instance.
 15. An apparatus forproducing a benchmark application for testing input and output(I/O)—settings of a computer application, the apparatus comprising:means for compiling trace data relating to operations to be executed bythe computer application; means for grouping the trace data into one ormore phases, based on different stages in an execution of the computerapplication to which the operations relate; means for identifyingpatterns in the trace data and comparing the patterns; means forproducing simplified trace data in which trace data having similarpatterns are combined; and means for outputting a benchmark applicationwhich includes the simplified trace data and information indicatingwhere the trace data have been combined.